Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 964 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 129.1 KiB |
| Average record size in memory | 137.1 B |
Variable types
| CAT | 11 |
|---|---|
| NUM | 6 |
| BOOL | 1 |
Reproduction
| Analysis started | 2020-07-16 09:49:31.047732 |
|---|---|
| Analysis finished | 2020-07-16 09:49:40.183596 |
| Duration | 9.14 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
current_year has constant value "2016" | Constant |
current_time has a high cardinality: 953 distinct values | High cardinality |
source_name has a high cardinality: 170 distinct values | High cardinality |
destination_name has a high cardinality: 168 distinct values | High cardinality |
train_name has a high cardinality: 504 distinct values | High cardinality |
current_week is highly correlated with current_date | High correlation |
current_date is highly correlated with current_week and 1 other fields | High correlation |
current_day is highly correlated with current_date | High correlation |
current_time is uniformly distributed | Uniform |
id_code has unique values | Unique |
| Distinct count | 964 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| wrdomkpyagubyzx | 1 |
|---|---|
| bhhpnhuvhirkhin | 1 |
| tvpkzaytnnyhtuj | 1 |
| ajfcikhfciubgxk | 1 |
| ehhcktdhpckymcf | 1 |
| Other values (959) |
| Value | Count | Frequency (%) | |
| wrdomkpyagubyzx | 1 | 0.1% | |
| bhhpnhuvhirkhin | 1 | 0.1% | |
| tvpkzaytnnyhtuj | 1 | 0.1% | |
| ajfcikhfciubgxk | 1 | 0.1% | |
| ehhcktdhpckymcf | 1 | 0.1% | |
| mdrlwiczxvxhrqx | 1 | 0.1% | |
| xocodwijjeuwvuv | 1 | 0.1% | |
| gghsevkjxkcmxju | 1 | 0.1% | |
| gssykqcbduuwtoq | 1 | 0.1% | |
| znfjfgtmesawnns | 1 | 0.1% | |
| Other values (954) | 954 | 99.0% |
Length
| Max length | 15 |
|---|---|
| Median length | 15 |
| Mean length | 15 |
| Min length | 15 |
| Distinct count | 24 |
|---|---|
| Unique (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| 2016-10-14 | 73 |
|---|---|
| 2016-10-06 | 68 |
| 2016-10-13 | 66 |
| 2016-10-11 | 64 |
| 2016-10-10 | 62 |
| Other values (19) |
| Value | Count | Frequency (%) | |
| 2016-10-14 | 73 | 7.6% | |
| 2016-10-06 | 68 | 7.1% | |
| 2016-10-13 | 66 | 6.8% | |
| 2016-10-11 | 64 | 6.6% | |
| 2016-10-10 | 62 | 6.4% | |
| 2016-10-21 | 55 | 5.7% | |
| 2016-10-07 | 54 | 5.6% | |
| 2016-10-17 | 50 | 5.2% | |
| 2016-10-25 | 49 | 5.1% | |
| 2016-10-19 | 47 | 4.9% | |
| Other values (14) | 376 | 39.0% |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
| Distinct count | 953 |
|---|---|
| Unique (%) | 98.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| 10:31:36 PM | 2 |
|---|---|
| 08:36:18 AM | 2 |
| 09:36:55 PM | 2 |
| 05:05:44 PM | 2 |
| 06:08:58 PM | 2 |
| Other values (948) |
| Value | Count | Frequency (%) | |
| 10:31:36 PM | 2 | 0.2% | |
| 08:36:18 AM | 2 | 0.2% | |
| 09:36:55 PM | 2 | 0.2% | |
| 05:05:44 PM | 2 | 0.2% | |
| 06:08:58 PM | 2 | 0.2% | |
| 04:23:41 PM | 2 | 0.2% | |
| 08:56:07 AM | 2 | 0.2% | |
| 04:28:24 PM | 2 | 0.2% | |
| 08:18:10 AM | 2 | 0.2% | |
| 07:22:08 AM | 2 | 0.2% | |
| Other values (943) | 944 | 97.9% |
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 11 |
| Min length | 11 |
| Distinct count | 170 |
|---|---|
| Unique (%) | 17.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| station$544 | 100 |
|---|---|
| station$266 | 71 |
| station$147 | 68 |
| station$150 | 61 |
| station$130 | 61 |
| Other values (165) |
| Value | Count | Frequency (%) | |
| station$544 | 100 | 10.4% | |
| station$266 | 71 | 7.4% | |
| station$147 | 68 | 7.1% | |
| station$150 | 61 | 6.3% | |
| station$130 | 61 | 6.3% | |
| station$178 | 28 | 2.9% | |
| station$525 | 26 | 2.7% | |
| station$214 | 21 | 2.2% | |
| station$281 | 20 | 2.1% | |
| station$117 | 18 | 1.9% | |
| Other values (160) | 490 | 50.8% |
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 10.97406639 |
| Min length | 10 |
| Distinct count | 168 |
|---|---|
| Unique (%) | 17.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| station$130 | 86 |
|---|---|
| station$150 | 75 |
| station$147 | 75 |
| station$544 | 72 |
| station$266 | 56 |
| Other values (163) |
| Value | Count | Frequency (%) | |
| station$130 | 86 | 8.9% | |
| station$150 | 75 | 7.8% | |
| station$147 | 75 | 7.8% | |
| station$544 | 72 | 7.5% | |
| station$266 | 56 | 5.8% | |
| station$185 | 29 | 3.0% | |
| station$178 | 29 | 3.0% | |
| station$525 | 28 | 2.9% | |
| station$214 | 20 | 2.1% | |
| station$177 | 16 | 1.7% | |
| Other values (158) | 478 | 49.6% |
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 10.96887967 |
| Min length | 10 |
| Distinct count | 504 |
|---|---|
| Unique (%) | 52.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| ICZVZV | 22 |
|---|---|
| ICWYR | 17 |
| ICWAT | 16 |
| SYXUUV | 13 |
| SSXRTS | 12 |
| Other values (499) |
| Value | Count | Frequency (%) | |
| ICZVZV | 22 | 2.3% | |
| ICWYR | 17 | 1.8% | |
| ICWAT | 16 | 1.7% | |
| SYXUUV | 13 | 1.3% | |
| SSXRTS | 12 | 1.2% | |
| ICXUXZ | 10 | 1.0% | |
| ICWAR | 9 | 0.9% | |
| ICXYAT | 8 | 0.8% | |
| PTWWW | 8 | 0.8% | |
| PTXAV | 7 | 0.7% | |
| Other values (494) | 842 | 87.3% |
Length
| Max length | 8 |
|---|---|
| Median length | 6 |
| Mean length | 5.623443983 |
| Min length | 3 |
country_code_source
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| whber | |
|---|---|
| qwnll | 3 |
| wsluu | 1 |
| Value | Count | Frequency (%) | |
| whber | 960 | 99.6% | |
| qwnll | 3 | 0.3% | |
| wsluu | 1 | 0.1% |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
longitude_source
Real number (ℝ≥0)
| Distinct count | 170 |
|---|---|
| Unique (%) | 17.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.281108537344398 |
|---|---|
| Minimum | 2.6527700000000003 |
| Maximum | 6.133331 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.5 KiB |
Quantile statistics
| Minimum | 2.65277 |
|---|---|
| 5-th percentile | 3.23107465 |
| Q1 | 3.8419555 |
| median | 4.356801 |
| Q3 | 4.499323 |
| 95-th percentile | 5.50790115 |
| Maximum | 6.133331 |
| Range | 3.480561 |
| Interquartile range (IQR) | 0.6573675 |
Descriptive statistics
| Standard deviation | 0.578196479 |
|---|---|
| Coefficient of variation (CV) | 0.1350576548 |
| Kurtosis | 0.3915324059 |
| Mean | 4.281108537 |
| Median Absolute Deviation (MAD) | 0.2864185 |
| Skewness | 0.09998739626 |
| Sum | 4126.98863 |
| Variance | 0.3343111684 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3.710675 | 100 | 10.4% | |
| 4.715866 | 71 | 7.4% | |
| 4.356801 | 68 | 7.1% | |
| 4.336531 | 61 | 6.3% | |
| 4.360846 | 61 | 6.3% | |
| 4.421101 | 28 | 2.9% | |
| 3.216726 | 26 | 2.7% | |
| 4.482785 | 21 | 2.2% | |
| 5.566695 | 20 | 2.1% | |
| 4.56936 | 18 | 1.9% | |
| Other values (160) | 490 | 50.8% |
| Value | Count | Frequency (%) | |
| 2.65277 | 1 | 0.1% | |
| 2.66994 | 1 | 0.1% | |
| 2.868943 | 2 | 0.2% | |
| 2.925809 | 3 | 0.3% | |
| 2.999286 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 6.133331 | 1 | 0.1% | |
| 6.03711 | 1 | 0.1% | |
| 5.975381 | 1 | 0.1% | |
| 5.854917 | 1 | 0.1% | |
| 5.809971 | 2 | 0.2% |
latitude_source
Real number (ℝ≥0)
| Distinct count | 170 |
|---|---|
| Unique (%) | 17.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.88968450311203 |
|---|---|
| Minimum | 49.599996 |
| Maximum | 51.925093 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.5 KiB |
Quantile statistics
| Minimum | 49.599996 |
|---|---|
| 5-th percentile | 50.59978665 |
| Q1 | 50.824506 |
| median | 50.88228 |
| Q3 | 51.035896 |
| 95-th percentile | 51.19923 |
| Maximum | 51.925093 |
| Range | 2.325097 |
| Interquartile range (IQR) | 0.21139 |
Descriptive statistics
| Standard deviation | 0.2026782182 |
|---|---|
| Coefficient of variation (CV) | 0.00398269748 |
| Kurtosis | 5.135848634 |
| Mean | 50.8896845 |
| Median Absolute Deviation (MAD) | 0.1285955 |
| Skewness | -0.5555802221 |
| Sum | 49057.65586 |
| Variance | 0.04107846014 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 51.035896 | 100 | 10.4% | |
| 50.88228 | 71 | 7.4% | |
| 50.845658 | 68 | 7.1% | |
| 50.859663 | 61 | 6.3% | |
| 50.835707 | 61 | 6.3% | |
| 51.2172 | 28 | 2.9% | |
| 51.197226 | 26 | 2.7% | |
| 51.017648 | 21 | 2.2% | |
| 50.62455 | 20 | 2.1% | |
| 50.673667 | 18 | 1.9% | |
| Other values (160) | 490 | 50.8% |
| Value | Count | Frequency (%) | |
| 49.599996 | 1 | 0.1% | |
| 49.68053 | 2 | 0.2% | |
| 50.202821 | 2 | 0.2% | |
| 50.40471 | 9 | 0.9% | |
| 50.412171 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 51.925093 | 2 | 0.2% | |
| 51.533333 | 1 | 0.1% | |
| 51.462767 | 1 | 0.1% | |
| 51.364623 | 1 | 0.1% | |
| 51.322032 | 1 | 0.1% |
mean_halt_times_source
Real number (ℝ≥0)
| Distinct count | 144 |
|---|---|
| Unique (%) | 14.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 271.76839337538695 |
|---|---|
| Minimum | 11.973988439306 |
| Maximum | 686.61560693642 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.5 KiB |
Quantile statistics
| Minimum | 11.97398844 |
|---|---|
| 5-th percentile | 26.90578035 |
| Q1 | 72.32947977 |
| median | 202.1878613 |
| Q3 | 351.916185 |
| 95-th percentile | 686.6156069 |
| Maximum | 686.6156069 |
| Range | 674.6416185 |
| Interquartile range (IQR) | 279.5867052 |
Descriptive statistics
| Standard deviation | 222.6073966 |
|---|---|
| Coefficient of variation (CV) | 0.8191070119 |
| Kurtosis | -0.8941779117 |
| Mean | 271.7683934 |
| Median Absolute Deviation (MAD) | 148.9479769 |
| Skewness | 0.6802543401 |
| Sum | 261984.7312 |
| Variance | 49554.05304 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 309.0144509 | 100 | 10.4% | |
| 351.916185 | 71 | 7.4% | |
| 634.1647399 | 68 | 7.1% | |
| 686.6156069 | 61 | 6.3% | |
| 640.265896 | 61 | 6.3% | |
| 467.982659 | 28 | 2.9% | |
| 164.4190751 | 26 | 2.7% | |
| 306.5231214 | 21 | 2.2% | |
| 269.1242775 | 20 | 2.1% | |
| 421.6445087 | 18 | 1.9% | |
| Other values (134) | 490 | 50.8% |
| Value | Count | Frequency (%) | |
| 11.97398844 | 2 | 0.2% | |
| 16.42774566 | 1 | 0.1% | |
| 18.13872832 | 2 | 0.2% | |
| 18.21676301 | 1 | 0.1% | |
| 18.28323699 | 2 | 0.2% |
| Value | Count | Frequency (%) | |
| 686.6156069 | 61 | 6.3% | |
| 640.265896 | 61 | 6.3% | |
| 634.1647399 | 68 | 7.1% | |
| 467.982659 | 28 | 2.9% | |
| 421.6445087 | 18 | 1.9% |
country_code_destination
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| whber | |
|---|---|
| qwnll | 6 |
| wsluu | 4 |
| aqfre | 2 |
| Value | Count | Frequency (%) | |
| whber | 952 | 98.8% | |
| qwnll | 6 | 0.6% | |
| wsluu | 4 | 0.4% | |
| aqfre | 2 | 0.2% |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
longitude_destination
Real number (ℝ≥0)
| Distinct count | 168 |
|---|---|
| Unique (%) | 17.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.277023254771785 |
|---|---|
| Minimum | 2.3553093 |
| Maximum | 6.133331 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.5 KiB |
Quantile statistics
| Minimum | 2.3553093 |
|---|---|
| 5-th percentile | 3.216726 |
| Q1 | 3.942542 |
| median | 4.356801 |
| Q3 | 4.482785 |
| 95-th percentile | 5.327627 |
| Maximum | 6.133331 |
| Range | 3.7780217 |
| Interquartile range (IQR) | 0.540243 |
Descriptive statistics
| Standard deviation | 0.5722349278 |
|---|---|
| Coefficient of variation (CV) | 0.1337928025 |
| Kurtosis | 0.6801358516 |
| Mean | 4.277023255 |
| Median Absolute Deviation (MAD) | 0.2236695 |
| Skewness | -0.03962129251 |
| Sum | 4123.050418 |
| Variance | 0.3274528126 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 4.360846 | 86 | 8.9% | |
| 4.336531 | 75 | 7.8% | |
| 4.356801 | 75 | 7.8% | |
| 3.710675 | 72 | 7.5% | |
| 4.715866 | 56 | 5.8% | |
| 4.421101 | 29 | 3.0% | |
| 4.432221 | 29 | 3.0% | |
| 3.216726 | 28 | 2.9% | |
| 4.482785 | 20 | 2.1% | |
| 4.482076 | 16 | 1.7% | |
| Other values (158) | 478 | 49.6% |
| Value | Count | Frequency (%) | |
| 2.3553093 | 2 | 0.2% | |
| 2.736343 | 1 | 0.1% | |
| 2.868943 | 1 | 0.1% | |
| 2.925809 | 8 | 0.8% | |
| 2.999286 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 6.133331 | 4 | 0.4% | |
| 5.854917 | 2 | 0.2% | |
| 5.80615 | 1 | 0.1% | |
| 5.741581 | 1 | 0.1% | |
| 5.683331 | 1 | 0.1% |
latitude_destination
Real number (ℝ≥0)
| Distinct count | 168 |
|---|---|
| Unique (%) | 17.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.88996912116183 |
|---|---|
| Minimum | 48.8809984 |
| Maximum | 52.379128 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.5 KiB |
Quantile statistics
| Minimum | 48.8809984 |
|---|---|
| 5-th percentile | 50.570729 |
| Q1 | 50.835707 |
| median | 50.859663 |
| Q3 | 51.017648 |
| 95-th percentile | 51.19923 |
| Maximum | 52.379128 |
| Range | 3.4981296 |
| Interquartile range (IQR) | 0.181941 |
Descriptive statistics
| Standard deviation | 0.2401170186 |
|---|---|
| Coefficient of variation (CV) | 0.004718356539 |
| Kurtosis | 18.52005554 |
| Mean | 50.88996912 |
| Median Absolute Deviation (MAD) | 0.111736 |
| Skewness | -0.7938371832 |
| Sum | 49057.93023 |
| Variance | 0.05765618261 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 50.859663 | 86 | 8.9% | |
| 50.845658 | 75 | 7.8% | |
| 50.835707 | 75 | 7.8% | |
| 51.035896 | 72 | 7.5% | |
| 50.88228 | 56 | 5.8% | |
| 51.19923 | 29 | 3.0% | |
| 51.2172 | 29 | 3.0% | |
| 51.197226 | 28 | 2.9% | |
| 51.017648 | 20 | 2.1% | |
| 50.896456 | 16 | 1.7% | |
| Other values (158) | 478 | 49.6% |
| Value | Count | Frequency (%) | |
| 48.8809984 | 2 | 0.2% | |
| 49.599996 | 4 | 0.4% | |
| 50.285603 | 1 | 0.1% | |
| 50.29105 | 1 | 0.1% | |
| 50.376798 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 52.379128 | 4 | 0.4% | |
| 52.083329 | 1 | 0.1% | |
| 51.312432 | 1 | 0.1% | |
| 51.281626 | 1 | 0.1% | |
| 51.241021 | 1 | 0.1% |
mean_halt_times_destination
Real number (ℝ≥0)
| Distinct count | 149 |
|---|---|
| Unique (%) | 15.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 287.41924004029477 |
|---|---|
| Minimum | 10.28323699422 |
| Maximum | 686.61560693642 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.5 KiB |
Quantile statistics
| Minimum | 10.28323699 |
|---|---|
| 5-th percentile | 29.24566474 |
| Q1 | 72.32947977 |
| median | 180.5982659 |
| Q3 | 467.982659 |
| 95-th percentile | 686.6156069 |
| Maximum | 686.6156069 |
| Range | 676.3323699 |
| Interquartile range (IQR) | 395.6531792 |
Descriptive statistics
| Standard deviation | 238.8763177 |
|---|---|
| Coefficient of variation (CV) | 0.8311076102 |
| Kurtosis | -1.25318412 |
| Mean | 287.41924 |
| Median Absolute Deviation (MAD) | 131.8988439 |
| Skewness | 0.5434287667 |
| Sum | 277072.1474 |
| Variance | 57061.89516 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 640.265896 | 86 | 8.9% | |
| 686.6156069 | 75 | 7.8% | |
| 634.1647399 | 75 | 7.8% | |
| 309.0144509 | 72 | 7.5% | |
| 351.916185 | 56 | 5.8% | |
| 467.982659 | 29 | 3.0% | |
| 421.6445087 | 29 | 3.0% | |
| 164.4190751 | 28 | 2.9% | |
| 306.5231214 | 20 | 2.1% | |
| 153.1156069 | 16 | 1.7% | |
| Other values (139) | 478 | 49.6% |
| Value | Count | Frequency (%) | |
| 10.28323699 | 3 | 0.3% | |
| 17.77456647 | 1 | 0.1% | |
| 18.28323699 | 1 | 0.1% | |
| 19.43352601 | 1 | 0.1% | |
| 19.82369942 | 5 | 0.5% |
| Value | Count | Frequency (%) | |
| 686.6156069 | 75 | 7.8% | |
| 640.265896 | 86 | 8.9% | |
| 634.1647399 | 75 | 7.8% | |
| 467.982659 | 29 | 3.0% | |
| 421.6445087 | 29 | 3.0% |
| Distinct count | 1 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| 2016 |
|---|
| Value | Count | Frequency (%) | |
| 2016 | 964 | 100.0% |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
| Distinct count | 4 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| 41 | |
|---|---|
| 42 | |
| 43 | |
| 40 |
| Value | Count | Frequency (%) | |
| 41 | 348 | 36.1% | |
| 42 | 267 | 27.7% | |
| 43 | 191 | 19.8% | |
| 40 | 158 | 16.4% |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
| Distinct count | 7 |
|---|---|
| Unique (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.5 KiB |
| Friday | |
|---|---|
| Thursday | |
| Tuesday | |
| Monday | |
| Wednesday | |
| Other values (2) |
| Value | Count | Frequency (%) | |
| Friday | 212 | 22.0% | |
| Thursday | 190 | 19.7% | |
| Tuesday | 158 | 16.4% | |
| Monday | 151 | 15.7% | |
| Wednesday | 129 | 13.4% | |
| Sunday | 62 | 6.4% | |
| Saturday | 62 | 6.4% |
Length
| Max length | 9 |
|---|---|
| Median length | 7 |
| Mean length | 7.088174274 |
| Min length | 6 |
is_weekend
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 964.0 B |
| False | |
|---|---|
| True | 124 |
| Value | Count | Frequency (%) | |
| False | 840 | 87.1% | |
| True | 124 | 12.9% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| id_code | current_date | current_time | source_name | destination_name | train_name | country_code_source | longitude_source | latitude_source | mean_halt_times_source | country_code_destination | longitude_destination | latitude_destination | mean_halt_times_destination | current_year | current_week | current_day | is_weekend | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | mckbezdplrukagl | 2016-10-06 | 01:05:38 AM | station$143 | station$142 | SZAYASZ | whber | 4.243393 | 50.866728 | 39.121387 | whber | 4.273543 | 50.868337 | 39.121387 | 2016 | 40 | Thursday | False |
| 1 | agxwrnbmzbyxsjg | 2016-10-06 | 01:05:56 AM | station$133 | station$147 | ICXYXY | whber | 4.326220 | 50.880833 | 95.676301 | whber | 4.356801 | 50.845658 | 634.164740 | 2016 | 40 | Thursday | False |
| 2 | iqjojyewdyfshtj | 2016-10-06 | 06:11:54 AM | station$632 | station$544 | ICWAT | whber | 3.264549 | 50.824506 | 153.115607 | whber | 3.710675 | 51.035896 | 309.014451 | 2016 | 40 | Thursday | False |
| 3 | hssqexnzirioaag | 2016-10-06 | 07:00:00 AM | station$296 | station$281 | ICWYR | whber | 5.599695 | 50.613152 | 87.130058 | whber | 5.566695 | 50.624550 | 269.124277 | 2016 | 40 | Thursday | False |
| 4 | lublknpfraiznhr | 2016-10-06 | 07:00:09 AM | station$281 | station$266 | ICWYR | whber | 5.566695 | 50.624550 | 269.124277 | whber | 4.715866 | 50.882280 | 351.916185 | 2016 | 40 | Thursday | False |
| 5 | hgqkwjbpavdwmob | 2016-10-06 | 07:00:15 AM | station$266 | station$130 | ICWYR | whber | 4.715866 | 50.882280 | 351.916185 | whber | 4.360846 | 50.859663 | 640.265896 | 2016 | 40 | Thursday | False |
| 6 | tcoajkwstpxkrdx | 2016-10-06 | 07:00:19 AM | station$130 | station$147 | ICWYR | whber | 4.360846 | 50.859663 | 640.265896 | whber | 4.356801 | 50.845658 | 634.164740 | 2016 | 40 | Thursday | False |
| 7 | muqhmlfqyzozvkn | 2016-10-06 | 07:00:25 AM | station$147 | station$150 | ICWYR | whber | 4.356801 | 50.845658 | 634.164740 | whber | 4.336531 | 50.835707 | 686.615607 | 2016 | 40 | Thursday | False |
| 8 | zdwfnxlivjlitzd | 2016-10-06 | 07:04:27 AM | station$296 | station$147 | ICWYR | whber | 5.599695 | 50.613152 | 87.130058 | whber | 4.356801 | 50.845658 | 634.164740 | 2016 | 40 | Thursday | False |
| 9 | wznosynddwsawbv | 2016-10-06 | 07:07:43 AM | station$266 | station$147 | ICVYR | whber | 4.715866 | 50.882280 | 351.916185 | whber | 4.356801 | 50.845658 | 634.164740 | 2016 | 40 | Thursday | False |
Last rows
| id_code | current_date | current_time | source_name | destination_name | train_name | country_code_source | longitude_source | latitude_source | mean_halt_times_source | country_code_destination | longitude_destination | latitude_destination | mean_halt_times_destination | current_year | current_week | current_day | is_weekend | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 954 | rzumrsbcrnuxzag | 2016-10-29 | 07:26:37 AM | station$214 | station$178 | ICRYZR | whber | 4.482785 | 51.017648 | 306.523121 | whber | 4.421101 | 51.217200 | 467.982659 | 2016 | 43 | Saturday | True |
| 955 | pxmhcvnwktxqukn | 2016-10-29 | 08:03:44 AM | station$241 | station$247 | ICYRYR | whber | 5.327627 | 50.930822 | 180.598266 | whber | 5.050031 | 50.993341 | 84.919075 | 2016 | 43 | Saturday | True |
| 956 | cugnfjqcwwqrjhu | 2016-10-29 | 08:55:57 AM | station$272 | station$209 | ICYRYR | whber | 4.824043 | 50.984406 | 123.800578 | whber | 4.708235 | 51.074146 | 53.060694 | 2016 | 43 | Saturday | True |
| 957 | bxhlrxcgiapiaab | 2016-10-29 | 08:56:07 AM | station$200 | station$185 | ICYRYR | whber | 4.560614 | 51.135758 | 154.413295 | whber | 4.432221 | 51.199230 | 421.644509 | 2016 | 43 | Saturday | True |
| 958 | uunreizjxarghpv | 2016-10-29 | 09:14:01 AM | station$200 | station$185 | ICYRYR | whber | 4.560614 | 51.135758 | 154.413295 | whber | 4.432221 | 51.199230 | 421.644509 | 2016 | 43 | Saturday | True |
| 959 | pnfrvyxsejnehwu | 2016-10-29 | 09:14:45 AM | station$544 | station$530 | ICZVXA | whber | 3.710675 | 51.035896 | 309.014451 | whber | 3.447848 | 51.092295 | 78.488439 | 2016 | 43 | Saturday | True |
| 960 | omsilbnrgbvkeak | 2016-10-29 | 10:17:59 AM | station$530 | station$544 | ICZVZA | whber | 3.447848 | 51.092295 | 78.488439 | whber | 3.710675 | 51.035896 | 309.014451 | 2016 | 43 | Saturday | True |
| 961 | vkjvqmaaguaeqde | 2016-10-29 | 10:39:10 AM | station$178 | station$147 | ICRYYW | whber | 4.421101 | 51.217200 | 467.982659 | whber | 4.356801 | 50.845658 | 634.164740 | 2016 | 43 | Saturday | True |
| 962 | iutnjhogthfpymb | 2016-10-29 | 10:59:55 AM | station$147 | station$150 | ICZVXY | whber | 4.356801 | 50.845658 | 634.164740 | whber | 4.336531 | 50.835707 | 686.615607 | 2016 | 43 | Saturday | True |
| 963 | xwqxedeqlnimclu | 2016-10-29 | 11:48:37 AM | station$525 | station$536 | ICZVXW | whber | 3.216726 | 51.197226 | 164.419075 | whber | 3.133864 | 51.312432 | 21.416185 | 2016 | 43 | Saturday | True |